449 research outputs found
Shrewd Selection Speeds Surfing: Use Smart EXP3!
In this paper, we explore the use of multi-armed bandit online learning
techniques to solve distributed resource selection problems. As an example, we
focus on the problem of network selection. Mobile devices often have several
wireless networks at their disposal. While choosing the right network is vital
for good performance, a decentralized solution remains a challenge. The
impressive theoretical properties of multi-armed bandit algorithms, like EXP3,
suggest that it should work well for this type of problem. Yet, its real-word
performance lags far behind. The main reasons are the hidden cost of switching
networks and its slow rate of convergence. We propose Smart EXP3, a novel
bandit-style algorithm that (a) retains the good theoretical properties of
EXP3, (b) bounds the number of switches, and (c) yields significantly better
performance in practice. We evaluate Smart EXP3 using simulations, controlled
experiments, and real-world experiments. Results show that it stabilizes at the
optimal state, achieves fairness among devices and gracefully deals with
transient behaviors. In real world experiments, it can achieve 18% faster
download over alternate strategies. We conclude that multi-armed bandit
algorithms can play an important role in distributed resource selection
problems, when practical concerns, such as switching costs and convergence
time, are addressed.Comment: Full pape
Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering
In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number of clusters in the data, pairs of similar clusters are merged based on the proposed similarity-driven cluster merging criterion. The similarity between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive threshold is used for merging. In addition, a modified generalized objective function is used for prototype-based fuzzy clustering. The function includes the p-norm distance measure as well as principal components of the clusters. The number of the principal components is determined automatically from the data being clustered. The performance of this unsupervised fuzzy clustering algorithm is evaluated by several experiments of an artificial data set and a gene expression data set.Singapore-MIT Alliance (SMA
Optimization of Analytic Window Functions
Analytic functions represent the state-of-the-art way of performing complex
data analysis within a single SQL statement. In particular, an important class
of analytic functions that has been frequently used in commercial systems to
support OLAP and decision support applications is the class of window
functions. A window function returns for each input tuple a value derived from
applying a function over a window of neighboring tuples. However, existing
window function evaluation approaches are based on a naive sorting scheme. In
this paper, we study the problem of optimizing the evaluation of window
functions. We propose several efficient techniques, and identify optimization
opportunities that allow us to optimize the evaluation of a set of window
functions. We have integrated our scheme into PostgreSQL. Our comprehensive
experimental study on the TPC-DS datasets as well as synthetic datasets and
queries demonstrate significant speedup over existing approaches.Comment: VLDB201
PeerDB-Peering into Personal Databases
In this talk, we will present the design and evaluation of PeerDB, a peer-to-peer (P2P) distributed data sharing system. PeerDB distinguishes itself from existing P2P systems in several ways. First, it is a full-fledge data management system that supports fine-grain content-based searching. Second, it facilitates sharing of data without shared schema. Third, it combines the power of mobile agents into P2P systems to perform operations at peers' sites. Fourth, PeerDB network is self-configurable, i.e., a node can dynamically optimize the set of peers that it can communicate directly with based on some optimization criterion.Singapore-MIT Alliance (SMA
Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach
In a distributed stream processing system, streaming data are continuously disseminated from the sources to the distributed processing servers. To enhance the dissemination efficiency, these servers are typically organized into one or more dissemination trees. In this paper, we focus on the problem of constructing dissemination trees to minimize the average loss of fidelity of the system. We observe that existing heuristic-based approaches can only explore a limited solution space and hence may lead to sub-optimal solutions. On the contrary, we propose an adaptive and cost-based approach. Our cost model takes into account both the processing cost and the communication cost. Furthermore, as a distributed stream processing system is vulnerable to inaccurate statistics, runtime fluctuations of data characteristics, server workloads, and network conditions, we have designed our scheme to be adaptive to these situations: an operational dissemination tree may be incrementally transformed to a more cost-effective one. Our adaptive strategy employs distributed decisions made by the distributed servers independently based on localized statistics collected by each server at runtime. For a relatively static environment, we also propose two static tree construction algorithms relying on apriori system statistics. These static trees can also be used as initial trees in a dynamic environment. We apply our schemes to both single- and multi-object dissemination. Our extensive performance study shows that the adaptive mechanisms are effective in a dynamic context and the proposed static tree construction algorithms perform close to optimal in a static environmen
- …